Linear Systems and LU

Linear Systems

\[ \begin{aligned} Ax &= b \\ A &\in \mathbb{R}^{n \times n} \\ x &\in \mathbb{R}^n \\ b &\in \mathbb{R}^n \end{aligned} \]

Case 1: Solvable

\[ \begin{aligned} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} &= \begin{bmatrix} -1 \\ 1 \end{bmatrix} \end{aligned} \] Completely determined

Case 2: No solution

\[ \begin{aligned} \begin{bmatrix} 1 & 0 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} &= \begin{bmatrix} -1 \\ 1 \end{bmatrix} \end{aligned} \] Overdetermined

Case 3: Infinitely many solutions

\[ \begin{aligned} \begin{bmatrix} 1 & 0 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} &= \begin{bmatrix} -1 \\ -1 \end{bmatrix} \end{aligned} \] Underdetermined

No Other Cases

Proposition

If \(Ax = b\) has two distinct solutions \(x_0\) and \(x_1\), then it has infinitely many solutions.

Proof

\[ \begin{aligned} t\in \mathbb{R} \\ A(tx_0 + (1-t)x_1) &= tAx_0 + (1-t)Ax_1 \\ &= tb + (1-t)b \\ &= b \end{aligned} \]

Dependence on Shape

Proposition

Tall matrices admit unsolvable right hand sides.

Proof

\[ A \in \mathbb{R}^{m \times n}, m > n,\\ \text{every col} \in \mathbb{R}^m, \text{n total}. \\ \begin{aligned} n < m &\xRightarrow{} \exists v \notin col(A)\\ &\xRightarrow{} Ax = v \text{ unsolvable} \end{aligned} \]

Proposition

Wide matrices admit right hand sides with infinitely many solutions.

Proof

\[ \begin{aligned} m < n &\xRightarrow{} \text{cols are linearly dependent} \\ &\xRightarrow{}\exists y, s.t. Ay = 0, y \neq 0 \\ &\xRightarrow{} A(x + y) = Ax = b \end{aligned} \]

For this Lecture

All matrices will be:

  • Square
  • Invertible(nonsingular)

Foreshadowing

Even rounding entries of a matrix can change it from non-invertible to invertible.(but ill-conditioned) \[ \begin{bmatrix} 1 & 0 \\ 0 & \epsilon \end{bmatrix} \]

Row Operations: Permutation

\[ \sigma: \{1, 2, \dots, m\} \rightarrow \{1, 2, \dots, m\} \\ P_\sigma := \begin{bmatrix} - & e_{\sigma(1)}^T & - \\ - & e_{\sigma(2)}^T & - \\ - & \vdots & - \\ - & e_{\sigma(m)}^T & - \\ \end{bmatrix} \]

\(e_{i}^TA\) is the \(i\)th row of \(A\).

Row Operations: Row Scaling

\[ S_a := \begin{bmatrix} a_1 & 0 & \dots & 0 \\ 0 & a_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & a_m \end{bmatrix} \]

"Scale row k by c and add to row l":

\[ (I+ce_le_k^T)A \]

Inverting Matrices

Common rule of thumb: Do not compute \(A^{-1}\) if you do not need it.

  • Not the same as solving \(Ax = b\).
  • Can be slow and poorly conditioned.

Steps of Gaussian Elimination

  1. Forward substitution: For each row \(i\):
    • Scale row to get pivot 1
    • For each j > i, subtract multiple of row i from row j to zero out pivot column
  2. Backward substitution: For each row \(i = m, m-1, \dots, 1\):
    • For each j < i, zero out rest of column

Total runtime: \(O(n^3)\)

总共n行,每一行需要处理其下面的\(O(n)\)行,每次处理一行有\(O(n)\)次减法,总共\(O(n^3)\)

Pivoting

Permute rows and/or columns to avoid dividing by small numbers or zero.

  • Partial pivoting
  • Full pivoting

LU

Observation: Triangular systems can be solved in \(O(n^2)\) time

\[ E_m \dots E_2E_1Ax = E_m \dots E_2E_1b \\ \] \(E_m \dots E_2E_1 = L^{-1}\) is lower triangular.

  • row scaling - diagonal
  • forward substitution - \(I + ce_le_j^T, l > j\) - lower triangular

\(L^{-1}A\) is upper triangular. \[ L^{-1}A = U \]

Why is \(L^{-1}\) triangular?

\[ S:= diag(a_1, a_2, \dots) \\ S^{-1} = diag(\frac{1}{a_1}, \frac{1}{a_2}, \dots) \\ E := I + ce_le_k^T, l > k \\ E^{-1} = I - ce_le_k^T, l > k \\ \]

Proposition: Product of lower triangular matrices is LT

proof: \[ \text{Take A,B LT} \xRightarrow{} a_{ij} = b_{ij} = 0 \forall j > i \\ \text{Define } C = AB. \text{Suppose } j > i \\ C_{ij} = \sum_{k} A_{ik}B_{kj} = A_{i1}B_{1j} + \dots + A_{ij}B_{jj} + \dots + A_{in}B_{nj} = 0 \\ \]

Pivoting by Swapping Columns

\[ (E_k \dots E_2E_1)A(P_1 \dots P_{l-1}) = U \\ \xRightarrow{} A = LUP \] elimination, permutation \[ Ax = b \\ LUPx = b \\ x = P^TL^{-1}U^{-1}b \]

Linear regression

Normal equations: \(A^TAx = A^Tb\)

\(A^TA\) is Gram matrix

Regulization

  • Tikhonov regularization("ridge regression;" Gauss prior): \[ \min_x ||Ax - b||_2^2 + \alpha ||x||_2^2 \]

  • Lasso(Laplace prior):

\[ \min_x ||Ax - b||_2^2 + \alpha ||x||_1 \]

  • Elastic net:

\[ \min_x ||Ax - b||_2^2 + \alpha ||x||_1 + \beta ||x||_2^2 \]

Example: Image Alignedment

\[ y_k = Ax_k + b \\ A \in \mathbb{R}^{2 \times 2}, b \in \mathbb{R}^2 \\ \min_{A,b} \sum_{k=1}^n ||Ax_k + b - y_k ||_2^2 \]

Example: Mesh Parameterization

\[ G = (V, E) \\ \textrm{Variable: } x_i \in \mathbb{R}^{2}, \textrm{position of vertex i} \\ \min_{x_1, \dots, x_n} \sum_{(i,j) \in E} ||x_i - x_j||_2^2\\ \textrm{s.t } x_i \textrm{ fixed } \forall i \in \textrm{boundary} \]

  • "Tutte parameterization"
  • "Harmonic parameterization": A secret differential equation model hiding in the objective function \[ \int ||\nabla f||^2 \]
作者

Gehan Zheng

发布于

2023-12-17

更新于

2023-12-21

许可协议

评论